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Abstract 

Calculating  the  distance  from  camera  to  feature  point  by 
temporal  cross-correlation  of  pixels  in  a flat  image  plane  is  better 
done  using  simple  geometric  relations  than  using  time  differentials. 
Both  these  methods  of  computation  will  be  presented  and  contrasted. 
The  effect  of  measurement  error  and  other  inaccuracies  on  the 
determination  of  range  is  also  described. 
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1 Introduction 


An  important  problem  in  machine  vision  is  to  determine  the 
distance  from  camera  to  feature  by  appropriately  interpreting  a two 
dimensional  image  or  sequence  of  images  of  that  feature  [1,2, 3, 4, 5]. 
Several  solutions  have  succeeded  well  by  exploiting  a sequence  of 
images  obtained  through  relative  motion  of  camera  and  feature  point 
[6,7,8,9,10,11,12,13].  Another  method  of  determining  distance, 
called  temporal  cross-correlation  [14],  which  exploits  known  motion 
shows  promise.  This  is  similar  in  principle  to  the  Reichardt  model  of 
human  motion  sensors  in  which  one  calculates  the  time  required  for 
a feature  (e.g.,  a point  source  of  light)  to  move  between  two  small 
detectors  which  are  separated  by  a known  and  fixed  distance  [15]. 

The  problem  might  best  be  introduced  by  examining  figure  1 
and  the  definitions  within  it. 

vAt  = vdftf)) 
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The  goal  is  to  determine  the  perpendicular  depth,  h,  to  a small 
(small  enough  to  be  nearly  flat),  yet,  finite  sized  feature.  We  assume 
a simple  case  where  a camera  is  moving  perpendicular  to  the  optical 
axis  and  there  is  only  one  component  of  camera  motion.  It  is 
assumed  that  the  velocity  of  the  camera  is  known  and  constant  over 
the  time  interval  t0  to  tx.  Figure  1 indicates  that  pixels  0 and  1 are 
symmetrically  located  in  the  image  plane  on  either  side  of  the  optical 
axis.  We  define  d as  the  center-to-center  distance  between  pixels. 
The  actual  width  of  each  of  the  two  pixels  will,  typically,  be  less  than 
d.  If  one  can  determine  the  time.  At,  required  for  the  feature  to  be 
detected  by  the  pixel  (picture  element)  at  position  1 on  the  flat  image 
plane  after  it  is  detected  by  the  pixel  at  position  0,  then  h can  be 
found.  Two  methods  of  obtaining  h will  be  given:  a simple  geometric 
method  and  a time  differential  method. 


2 Geometric  Method 


A geometric  method  for  finding  the  depth,  h,  falls  easily  out  of 
the  definitions  in  figure  1 by  noting  the  two  similar  triangles. 


h s , svAt 

= — =>  h = 

vAt  d d 


(1) 


At  is  the  only  unknown  in  this  expression  and  is  obtained  by  the 
method  of  temporal  correlation  as  discussed  in  section  4. 

3 Time  Differential  Method 

A time  differential  method  for  finding  the  depth,  h,  can  also  be 
derived  using  the  definitions  in  figure  1. 

A(t)  is  defined  as  the  angle  (at  time  t)  between  the  camera 
velocity  vector,  v,  and  the  range  vector,  r(£),  from  lens  to  feature.  h 
is  the  perpendicular  distance  from  camera  to  feature  and  is  constant 
with  respect  to  time  (see  figure  1).  Let  t0  = -tl  in  figure  1.  This 
implies  (assuming  a constant  camera  velocity)  that  A(t  = 0)  = n/2 . As  a 
result  of  these  definitions. 
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./ AU\\  vt  d ( vt } 

v K n h dt 


= dtcot(A(')) 


^'mcssim 


From  simple  geometry, 


csc(A(£))  = 


(2) 

(3) 


If  we  substitute  equation  3 into  equation  2,  we  get. 


v _ dA(t)  r(t) 
h dt  h 


csc(A(f))  or  r(t)  = 


usin(A(f)) 

dA(t)/dt 


(4) 


The  accuracy  of  equation  4 for  calculating  r(t)  is  dependent  on 
whether  one  can  accurately  evaluate  dA(t)/dt,  which,  among  other 
things,  is  dependent  on  the  shape  of  A(i).  From  equation  2, 

A(£)  = cot-1  (-vt/h)  ^ 

Referring  back  to  figure  1,  we  see  that  £0  = -At/2  and  tj  =A£/2.  Then 
we  can  define  AA(0)4A(£j)- A(t0).  From  equation  5 and  figure  2 it  can 
be  seen  that  A(t)  is  not  a linear  function  of  t.  In  fact. 
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Using  equation  4 and  since  the  derivative  is  only  approximated  by 
AA(t)/At. 


r (0- 


uAtsin(A(t)) 

AA(0 


(7) 


Referring  to  figure  1 again  and  letting  t = 0, 


r(0)  = h « 


vAt 

AA(0) 


(8) 


Equation  8 says  that  the  depth,  h,  to  the  feature  is  approximated  by 
the  constant  velocity  of  the  camera  times  the  time.  At,  it  takes  the 
light  from  the  feature  to  traverse  through  AA(0)  radians  divided  by 
AA(0).  In  addition,  the  expression,  uAt/AA(0),  in  equation  8 will 
always  be  greater  than  the  correct  value  for  the  depth,  h.  This  can 
be  seen  from  the  shape  of  A(t)  as  depicted  in  figure  2. 


4 Temporal  Correlation 


It  has  been  shown  [14]  that  At  (in  equations  1 and  8)  can  be 
found  by  doing  cross-correlation  of  time  sequences  of  intensity 
output  of  the  pixels  at  positions  0 and  1 on  the  image  plane  (see 
figure  1).  Each  of  these  time  sequences  have  N components.  In  this 
sense.  At  is  equal  to  the  shifting  required  to  match  or  "fit"  one  time 
sequence  with  the  other.  In  mathematical  terms,  if  xn  is  the  finite 
series  of  discrete  time  samples  of  the  analog  output  of  the  pixel  at 
position  0,  and  yn  is  the  same  for  the  pixel  at  position  1,  then  we 
want  to  find  the  pmtn  e {0,1,...}  that  minimizes 


N- 1 2 

X(*n-yn+p) 


(9) 


over  all  positive  integers  p.  If  the  sampling  period  is  T,  then 
A£  = PmlnT. 

5 Error  in  the  Distance  to  a Feature 

An  advantage  to  equation  1 (h  = svAt/d)  is  that  there  might  be  a 
gain  in  the  precision  (and,  perhaps,  accuracy)  of  h (the 
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perpendicular  distance  to  a feature)  by  making  d bigger  without 
sacrificing  the  exactness  of  the  expression.  It  is  argued  below  that, 
under  certain  conditions,  there  is  some  (possibly  slight)  gain  in 
precision  by  making  d bigger. 

Equation  1 is  only  theoretically  exact.  Alas,  the  true  values,  d 
and  s,  may  not  perfectly  match  the  published  specifications  of  the 
manufacturers  of  the  detector  and  camera.  In  addition,  v is  a 
measured  quantity  that  will  have  its  own  errors,  dependent  upon  the 
precision  and  accuracy  of  the  velocity  measuring  equipment.  At 
results  from  the  cross  correlation  operation  and  is  also  a measured 
quantity  with  its  own  sources  of  error  ( e.g . detector  noise,  amplifier 
noise,  poor  focus  to  the  feature,  quantization,  occlusion,  and  lack  of 
flatness  in  the  feature). 

We  will  now  define  these  errors  and  analyze  their  effect  on  h. 
For  the  remainder  of  this  section,  consider  that  h,  s,  d,  v.  At  are 
random  variables  of  mean  /zh,  /xs,  /id,  /iv,  and  variance  o£,  oj,  c^. 


In  addition,  it  can  be  expected  that  the  errors  in  s and  d will 
be  fixed  with  respect  to  time  and  are  therefore  semi-correctable  (i.e. 
calibrate  the  optical  system  by  placing  an  object  at  a known  distance 
from  the  lens  and  making  a sufficient  number  of  measurements  of  h 
in  order  to  determine  the  correction  factor).  In  this  case,  we  can 
assume  that 


tf  = o?=0 


(10) 


implying  that  s and  d are  deterministic.  We  will  further  assume  that 
the  random  variables,  v and  At,  are  independent  since  their  errors 
are  caused  by  independent  measurement  systems.  From  equation  1 
we  get  an  expression  for  o jj,  the  error  in  depth. 


(11) 


Now  we  will  show  that  an  increase  in  d will  cause  a decrease  in 
o£.  From  figure  1 we  see  that,  if  we  increase  d to  d',  /iM  will 
increase  to,  say,  nM..  In  addition. 
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A*At  - £W 
d d' 


(12) 


This  gives  a new  expression  for  o£2,  the  new  error  in  depth. 


In  equation  13,  it  is  assumed  that  crf(  * o^,,  which  is  to  say  that  the 
variance  of  the  random  variable  At  will  be  effectively  independent  of 
the  size  of  its  mean. 

Using  equation  12,  we  find  that  the  second  terms  on  the  right 
hand  side  of  equations  11  and  13  are  equal.  Therefore,  the  first  and 
third  terms  on  the  right  hand  side  of  equations  11  and  13, 
^{rfol.  + ol^/d2  and  s2(o^t  + c^t/z2)/d'2,  respectively,  prove  our 
claim  (i.e.,  if  d < d',  > o£). 

To  give  a realistic  example,  let  s = 60  mm,  d = 0.5  mm, 
Hv  - 1 m/s,  and  /zh  = 30  cm  (corresponds  to  a thin  lens  system  with 
focal  length  equal  to  50  mm).  These  values  imply  that  nAt  = 2.5  ms. 
Let  <7V  = 0.05  m/s  and  <rAt  = 50  |is.  If  we  increase  d by  a factor  of  ten, 
d'  = 5 mm  (with  no  change  in  the  size  of  each  pixel)  and  iiM.  =25  ms. 
Finally,  using  equations  11  and  13,  we  see  that,  indeed,  the  error  in 
depth  is  reduced  slightly,  since  cfh  = 1.5  cm  < <jh  = 1.6  cm.  Note, 
however,  that  a rather  large  change  in  d produces  a relatively  small 
change  in  depth  accuracy,  since  (crh-a£)//zh  <1%. 

It  should  also  be  noted  that  an  increase  in  At  might  cause 
greater  error  in  camera  velocity  (i.e.,  an  increase  in  o^).  Therefore, 
an  increase  in  d may  very  well  not  decrease  the  error  in  depth.  To 
demonstrate  this,  using  the  values  given  in  the  above  example,  say 
that  gv  = 0.05  m/s  increases  to  &v  = 0.055  m/s.  With  only  this  slight 
change,  we  see  that  (even  with  the  tenfold  increase  in  d) 
&h  = 1.65  cm  > <xh  = 1.62  cm,  i.e.  the  error  in  depth  increases. 

There  is  a further  cost  for  choosing  d large.  If  the  scene  has 
lots  of  occlusion,  the  pixel  at  position  0 may  see  the  feature  while  the 
pixel  at  position  1 does  not,  if,  at  that  perspective,  the  feature  is 
occluded  by  some  other  feature.  This  tradeoff  between  accuracy  and 
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occlusion  would  have  to  be  considered  for  each  individual 
application. 

In  conclusion,  if  the  scene  has  minimal  occlusion  and  one 
expects  no  increase  in  the  camera  velocity  error,  one  might  choose  d 
to  be  large  as  possible  to  get  maximum  accuracy.  However,  in 
general,  one  can  more  easily  maintain  the  required  constant  camera 
velocity  over  a shorter  time  period  which  argues  for  d small. 
Otherwise,  one  would  choose  pixels  that  are  adjacent  or  nearly 
adjacent  (which  is  equivalent  to  choosing  d small). 

6 Conclusion 

All  else  being  equal,  it  has  been  demonstrated  that  equation  1 
(the  geometric  method)  is  a more  accurate  way  to  calculate  depth,  h, 
than  is  equation  8 (the  time  differential  method),  assuming  a flat 
image  plane  (which  will  be  true  for  the  vast  majority  of  available 
semiconductor  photo  detector  arrays).  Both  methods  require  that 
the  camera  velocity,  v , be  constant  over  the  time.  At.  At  can  be  a 
relatively  long  time  for  distant  features.  Nonetheless,  for  many 
applications,  the  variation  of  v during  the  time.  At,  will  be  negligible. 

It  is  argued  in  section  5 that,  under  certain  conditions  (minimal 
occlusion  and  no  change  in  camera  velocity  error),  there  is  some  gain 
in  precision  by  making  d (in  equation  1)  bigger.  However,  there  is  a 
loss  using  the  time  differential  method  of  equation  8 in  making  d 
larger  (increasing  the  field -of-view)  as  can  be  seen  by  examining 
figure  2. 
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